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Ascertaining the existence of hidden objects in a complex system, objects that cannot be observed from the 
external world, not only is curiosity-driven but also has significant practical applications. Generally, 
uncovering a hidden node in a complex network requires successful identification of its neighboring nodes, 
but a challenge is to differentiate its effects from those of noise. We develop a completely data-driven, 
compressive-sensing based method to address this issue by utilizing complex weighted networks with 
continuous-time oscillatory or discrete-time evolutionary-game dynamics. For any node, compressive 
sensing enables accurate reconstruction of the dynamical equations and coupling functions, provided that 
time series from this node and aU its neighbors are available. For a neighboring node of the hidden node, this 
condition cannot be met, resulting in abnormally large prediction errors that, counterintuitively, can be 
used to infer the existence of the hidden node. Based on the principle of differential signal, we demonstrate 
that, when strong noise is present, insofar as at least two neighboring nodes of the hidden node are subject to 
weak background noise only, unequivocal identification of the hidden node can be achieved. 



When dealing with an unknown complex system that has a large number of interacting components 
organized hierarchically, curiosity demands that we ask the following question: are there hidden 
objects that are not accessible from the external world? The problem of inferring the existence of 
hidden objects from observations is quite challenging but it has significant applications in many disciplines of 
science and engineering. Here by "hidden" we mean that no direct observation of or information about the object 
is available, and so it appears to the outside world as a black box. However, due to the interactions between the 
hidden object and other observable components in the system, it maybe possible to utilize "indirect" information 
to infer the existence of the hidden object and to locate its position with respect to objects that can be observed. 
The difficulty to develop effective solutions is compounded by the fact that the indirect information on which any 
method of detecting hidden objects relies can be subtle and sensitive to changes in the system or in the envir- 
onment. In particular, in realistic situations noise and random disturbances are present. It is conceivable that the 
"indirect" information can be mixed up with that due to noise or be severely contaminated. The presence of noise 
thus poses a serious challenge to detecting hidden nodes, and some effective "noise-mitigation" method must be 
developed. 

To formulate the problem in a concrete way and to gain insights into the development of a general methodo- 
logy, we note that the basic principle underlying the detection of hidden objects is that their existence typically 
leads to "anomalies" in the quantities that can be calculated or deduced from observation. Simultaneously, noise, 
especially local random disturbances applied at the nodal level, can also lead to large variance in these quantities. 
This is so because, a hidden node is typically connected to a few nodes in the network that are accessible to the 
external world, and a noise source acting on a particular node in the network may also be regarded as some kind of 
hidden object. Thus, the key to any detection methodology is to identify and distinguish the effects of hidden 
nodes on measures for detection from those due to local noise sources. 

In this paper, we focus on complex networks and develop a general method to differentiate hidden nodes from 
local noise sources. This problem is intimately related to the works on reverse engineering of complex networks, 
where the goal is to uncover the fuU topology of the network based on measured time series'"^"'. Our method is 
based on the recent work^"* on utilizing compressive sensing^''"^'-' to detect hidden nodes in the absence of noise 
sources. To explain our method in a concrete setting, we use the network configuration shown schematically in 
Fig. 1, where there are 20 nodes, the couplings among the nodes are weighted, and the entire network is in a noisy 
environment, but a number of nodes also receive relatively strong random driving. We assume an oscillator 
network so that the nodal dynamics are described by nonlinear differential equations, and that time series can be 



SCIENTIFIC REPORTS | 4 : 3944 | DOI: 1 0. 1 038/srep03944 



1 




Figure 1 | An example of a complex network with a hidden node. Time 
series from all nodes except hidden node #20 can be measured, which can 
be detected when its immediate neighbors, nodes #3 and #7 are 
unambiguously identified. Nodes #7, #11, and #14 are driven by local noise 
sources. 

measured simultaneously from all nodes in the network except one, 
labeled as #20, which is a hidden node. The task of ascertaining the 
presence and locating the position of the hidden node are equivalent 
to identifying its immediate neighbors, which are nodes #3 and #7 in 
Fig. 1. Note that, in order to be able to detect the hidden node based 
on information from its neighboring nodes, the interactions between 
the hidden node and its neighbors must be directional from the 
former to the latter or be bidirectional. Otherwise, if the coupling 
is solely from the neighbors to the hidden node, the dynamics of the 
neighboring nodes will not be affected by the hidden node and, 
consequently, time series from the neighboring nodes wiU contain 
absolutely no information about the hidden node, which is therefore 
undetectable. The action of local noise source on a node is naturally 
directional, i.e., from the source to the node. 

Our recent work"" demonstrated that, when the compressive-sens- 
ing paradigm is applied to uncovering the network topology'^, the 
predicted linkages associated with nodes #3 and #7 are typically 
anomalously dense, and this piece of information is basically what 
is needed to identify them as the neighboring nodes of the hidden 
node. In addition, when different segments of measurement data are 
used to reconstruct the coupling weights for these two nodes, the 
reconstructed weights associated with these two nodes exhibit sig- 
nificantly larger variances than those associated with other nodes. 
However, the predicted linkages associated with the nodes driven by 
local noise sources can exhibit behaviors similar to those due to the 
hidden nodes, leading to uncertainty in the detection of the hidden 
node. To address this critical issue is essential to developing algo- 
rithms for real-world applications, which is the aim of this paper. Our 
main idea is to exploit the principle of differential signal to study the 
behavior of the predicted link weights as a function of the data used in 
the reconstruction. Due to the advantage of compressive sensing, the 
required data amount can be quite small and, hence, even if our 
method requires systematic increase of the data amount, it will still 
be reasonably small. We shall argue and demonstrate that, when the 
various ratios of the predicted weights associated with all pairs of 
links between the possible neighboring nodes and the hidden node 
are examined, those associated with the hidden nodes and nodes 
under strong local noise show characteristically distinct behaviors, 
rendering unambiguous identification of the neighboring nodes of 
the hidden node. Any such ratio is essentially a kind of differential 
signal, because it is defined with respect to a pair of edges. 



Results 

We present our results by using coupled oscillator networks. (Results 
from evolutionary- game dynamical networks are presented in 
Supporting Information.) Given such a networked system, we use 
compressive sensing to uncover all the nodal dynamical equations 
and coupling functions'^. This can be done by expanding all the 
vector fields and functions into series and calculating, from available 
time series, all the coefficients in the expansion. The expansion base 
needs to be chosen properly so that the number of non-zero coeffi- 
cients is small as compared with the total number Nf of unknown 
coefficients. All N, coefficients constitute a coefficient vector to be 
estimated. The amount of data used can be conveniently character- 
ized by R^, the ratio of the number M of data points used in the 
reconstruction, to N(. See Methods. 

Our idea to distinguish the effects of hidden node and local noise 
sources is based on the following observation. Consider two neigh- 
boring nodes of the hidden node, labeled as and j. Because the 
hidden node is a common neighbor of nodes i and j, the couplings 
from the hidden node should be approximately proportional to each 
other, with the proportional constant determined by the ratio of their 
link weights with the hidden node. When the dynamical equations of 
nodes ; and j are properly normalized, the terms due to the hidden 
node tend to cancel each other, leaving the normalization constant as 
a single unknown parameter that can be estimated subsequently. We 
name this parameter cancellation ratio and denote it by fly. As the 
data amount is increased, Qy tends to its true value. Practically we 
then expect to observe systematic changes in the estimated value of 
the ratio as data used in the compressive-sensing algorithm is 
increased from some small to relatively large amount. If only local 
noise sources are present, the ratio should show no systematic change 
with the data amount. Thus the distinct behaviors of Oy as the 
amount of data is increased provides a way to distinguish the hidden 
node from noise and, at the same time, to ascertain the existence of 
the hidden node. A mathematical formulation of this general prin- 
ciple can be found in Methods. 

We test our method to differentiate hidden nodes and noise using 
random networks of nonlinear/chaotic oscillators. To be concrete, 
we choose the nodal dynamics to be that of the Rossler oscillator, one 
of the classical models in nonlinear dynamics™, 

[xi, yi, z,]^[-yi-Zi, Xi + 0.2yi, 0.2-|-Zi(x,-5.7)], 

which exhibits a chaotic attractor. The size of the network varies from 
20 to 1 00, and the probability of connection between any two nodes is 
0.04. The network link weights are equally distributed in [0.1, 0.5] 
(arbitrary). Background noise of amplitude C is applied (indepen- 
dently) to every oscillator in the network, with amplitude varying 
from lO"* to 5 X lO""*. The noise amplitude is thus smaller than the 
average coupling strength of the network. The tolerance parameter £ 
in the compressive sensing algorithm can be adjusted in accordance 
with the noise amplitude (see Supporting Information for details). 
Time series are generated by using the standard Heun's algorithm" 
to integrate the stochastic differential equations. To approximate the 
velocity field, we use third-order polynomial expansions in the com- 
pressive-sensing formulation. (In Supporting Information, we pre- 
sent more examples using network systems of varying sizes, different 
weight distributions and topologies, and alternative nodal 
dynamics.). 

Detecting hidden node from time series. As a concrete example, we 
consider the network in Fig. 1, where only background noise is 
present and there are no local noise sources. Linear coupling 
between any pair of connected nodes is from the z-component to 
the x-component in the Rossler system. From the available time 
series (nodes #1-19), we solve the coefficient vector using a 
standard compressive-sensing algorithm [http://users.ece.gatech. 
edu/justin/Umagic/]. In particular, for node the terms associated 
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with couplings from the z-components of other nodes appear in the 
ith row of the couphng matrix. As shown in Fig. 2(a), when the data 
amount is R„ = 0.7, the network's couphng matrix can be predicted. 
The predicted hnks and the associated weights are sparse for all nodes 
except for nodes #3 and #7, the neighbors of the hidden node. While 
there are small errors in the predicted weights due to background 
noise, the predicted couplings for the two neighbors of the hidden 
node, which correspond to the 3rd and the 7th row in the coupling 
matrix, appear to be from almost all other nodes in the network and 
some coupling strength is even negative. Such anomalies associated 
with the predicted coupling patterns of the neighboring nodes of the 
hidden node cannot be removed by increasing the data amount. 
Nonetheless, it is precisely these anomalies which hint at the 
likelihood that these two "abnormal" nodes are connected with a 
hidden node. 

While abnormally high connectivity predicted for a node is likely 
indication that it belongs to the neighborhood of the hidden node, in 
complex networks there are hub nodes with abnormally large 
degrees, especially for scale-free networks'*^. In order to distinguish 
a hidden node's neighboring node from some hub node, we use the 
variance of the predicted coupling constants, which can be calculated 
from different segments of the available data sets. Due to the intrins- 
ically low-data requirement associated with compressive sensing, the 
calculation of the variance is feasible because any reasonable time 
series can be broken into a number of segments, and prediction can 
be made from each data segment. For nodes not in the neighborhood 
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Figure 2 | For the network in Fig. ( 1 ), (a) predicted coupling matrix for all 
nodes except node #20. Time series from nodes #1 to #19 are available, 
while node #20 is hidden. The predicted weights are indicated by color 
coding and the amount of data used is R„, = 0.7. The abnormally dense 
patterns in the 3rd and 7th rows suggest that nodes #3 and #7 are the 
immediate neighbors of the hidden node, (b) Variance <J^ of the predicted 
coefficients for all accessible nodes, which is calculated using 20 
independent reconstructions based on different segments of the data. The 
variances associated with nodes #3 and #7 are apparently much larger than 
those with the other nodes, confirming that these are the neighboring 
nodes of the hidden node. There is a definite gap between the values of the 
variance associated with neighboring and non-neighboring nodes of the 
hidden node, as indicated by the two horizontal dashed lines in (b). When 
the local noise sources are applied to node #7, #11 and #14, these there 
nodes have similar dense bars in (a) and large variances in (b) (data are not 
shown). 



of the hidden node, we expect the variance to be small as the pre- 
dicted results hardly change when different segments of the time 
series are used. However, for the neighboring nodes of the hidden 
node, due to lack of complete information needed to construct the 
measurement matrix, the variance values can be much larger. 
Figure 2(b) shows the variance in the predicted coupling strength 
for all 19 accessible nodes. We observe that the values of the variance 
for the neighboring nodes of the hidden node, nodes #3 and #7, are all 
above the upper dashed line and are in fact significantly larger than 
those associated with all other nodes that all fall below the lower 
dashed line. This indicates strongly that they are indeed the neigh- 
boring nodes of the hidden node. The gap between the two dashed 
lines can be taken as a quantitative measure of the detectabUity of the 
hidden node. The larger the gap, the more reliable it is to distinguish 
the neighbors of the hidden node from the nodes that not in the 
neighborhood. The results in Fig. 2 thus indicate that the locations 
of the hidden node(s) in the network can be reliably inferred even in 
the presence of weak background noise. The size of the gap, or the 
hidden-node detectabUity depends on the system details. In 
Supporting Information, we present results of a systematic analysis 
of the detectabUity measure, where we find that the variance due to 
the hidden nodes is mainly determined by the strength of their coup- 
ling with the accessible nodes in the network. We also find that 
system size and network topology have little effect on the hidden- 
node detectabUity. It is worth emphasizing that the detectabUity 
relies also on successful reconstruction of all nodes that are not in 
the neighborhood of the hidden nodes, which determine the lower 
dashed line in Fig. 2. 

To quantify the reliabUity of the reconstruction results, we invest- 
igate how the prediction errors in the link weights of all accessible 
nodes, except the predicted neighbors of the hidden node, change 
with the data amount. For an existent link, we use the normalized 
absolute error E„2, the error in the estimated weight with respect to 
the true one, normalized by the value of the true link weight. Figure 3 
shows the results for N = 100. The link weights are uniformly dis- 
tributed in the interval [0.1, 0.5] and the background noise amplitude 
is c = 10"^'. The tolerance parameter in the compressive-sensing 
algorithm is set to be e = 0.5, which is optimal for this noise ampli- 
tude. (In Supporting Information we provide details of determining 
the optimal tolerance parameter for different values of the back- 
ground noise amplitude.) We see that for > 0.4, E„2 decreases 
to the small value of about 0.01, which is determined by background 
noise level. As R^^ is increased further, the error is bounded by a small 
value determined by the noise amplitude, indicating that the recon- 
struction is robust. Although the value of E„^ does not decrease 
further toward zero due to noise, the prediction results are reliable 
in the sense that the predicted weights and the real values agree with 
each other, as shown in the inset of Fig. 3, a comparison of the actual 
and the predicted weights for all existent links. AU the predicted 
results are in the vicinities of the corresponding actual values, as 
indicated by a heavy concentration of the dots along the diagonal 
line. The central region in the dot distribution has brighter color than 
the marginal regions, confirming that vast majority of the predicted 
results are accurate. In Supporting Information, we further show that 
robust reconstruction can be achieved regardless of the network size, 
connection topology and weight distributions, insofar as sufficient 
data are available. 

The error measure £„2 to characterize the accuracy of the recon- 
struction is simUar to z-scores, or the standard score in statistics, with 
the minor difference being that z-scores use the standard derivatives 
of the distribution to normalize the raw scores, whUe we use the exact 
values in our model examples. In realistic applications the exact 
values are usuaUy not available, so it is necessary to use the z-score 
measure. 

We emphasize that there are two types of "dense" connections: 
one from reconstruction and another intrinsic to the network. In 
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Figure 3 | For random networks of size JV = 100 with uniform weight 
distribution in [0.1, 0.5], prediction error E^^ associated with nonzero 
coefficients of the dynamical equations of all nodes except for the 
neighboring nodes of the hidden node, as a function of normalized data 
amount R,„. The background noise amplitude is ^ = 10"' for all nodes. All 
data points are obtained from 10 independent realizations. Inset is a 
comparison of the predicted and actual weights for all existent links. Each 
dot represents one such link, and its x- value is the actual weight while the y- 
value is the corresponding predicted result. The color for each dot is 
determined by the dot density around it, while bright color signifies high 
density. The arrow indicates the value of R,„ used in the comparison study. 
The tolerance of the compressive-sensing algorithm is set to be £ = 0.5. 

particular, in the two-dimensional representation of the reconstruc- 
tion results [e.g.. Fig. 2(A)], the neighboring nodes of the hidden 
node typically appear densely linked to many other nodes in the 
network. These can be a result of lack of incomplete information 
(i.e., time series) due to the hidden node (in this case, there is indeed a 
hidden node), or the intrinsic dense connection pattern associated 
with, for example, a hub node in a scale-free network. Our idea of 
examining the variances of the reconstructed connections from inde- 
pendent data segments is for distinguishing these two possibilities. 
As we have demonstrated, extensive computations indicate that a 
combination of the dense connection and large variance can ascer- 
tain the existence of hidden node reliably. 

Differentiating hidden node from local noise sources. When 
strong noise sources are present at certain nodes, the predicted 
coupling patterns of the neighboring nodes of these nodes will 
show anomalies. (Here by "strong" we mean that the amplitudes 
of the random disturbances are order-of-magnitude larger than 
that of background noise.) We now demonstrate that our 
proposed method based on the cancellation ratio is effective at 
distinguishing hidden nodes from local noise sources, insofar as 
the hidden node has at least two neighboring nodes not subject to 
such disturbances. To be concrete, we choose a network oiN = 61 
coupled chaotic Rossler oscillators, which has 60 accessible nodes 
and one hidden node (#61) that is coupled to two neighbors: nodes 
#14 and #20, as shown schematically in Fig. 4. Assume a strong noise 
source is present at node #54. We find that the reconstructed weights 
match their true values to high accuracy. We also find that the 
reconstructed coefficients including the ratio are all constant 
and invariant with respect to different data segments, a strong 
signal that the pair of nodes are the neighboring nodes of the same 
hidden node, thereby confirming its existence. 

When there are at least two accessible nodes in the neighborhood 
of the hidden node which are not subject to strong noisy disturbance, 
such as nodes #14 and #20, as the data amount is increased 
towards 100%, the cancellation ratio should also increase and 
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Figure 4 | Schematic illustration of a hidden node and its coupling 
configuration with two neighbors in a random network of JV = 61 nodes, 
where 60 are accessible. A strong noise source is present at node #54. 

approach unity. This behavior is shown by the open circles in 
Fig. 5(a). However, when a node is driven by a local noise source, 
regardless of whether it is in the neighborhood of the hidden node, 
the cancellation ratio calculated from this node and any other access- 
ible node in the network will show a characteristically different beha- 
vior. Consider, for example, nodes #14 and #54. The reconstructed 
connection patterns of these two nodes both show anomalies, as they 
appear to be coupled with all other nodes in the network. In contrast 
to the case where the pair of nodes are influenced by the hidden node 
only, here the cancellation ratio does not show any appreciable 
increase as the data amount is increased, as shown by the crosses 
in Fig. 5(a). In addition, the average variance values of the predicted 
coefficient vectors of the two nodes exhibit characteristically differ- 
ent behaviors, depending on whether any one node in the pair is 
driven by strong noise or not. In particular, for the node pair #14 and 
#20, since neither is under strong noise, the average variance will 
decrease toward zero as R„ approaches unity, as shown in Fig. 5(b) 



a 




Figure 5 | For the network described in Fig. 4, (a) Predicted values of the 
cancellation ratio Sljj obtained from the differential signal of two 
neighboring nodes of the hidden node (#14 and #20, indicated by circles) 
and from the differential signal of nodes #14 and #54, where the latter is 
driven by noise of amplitude i; = 10"^ (crosses), (b) Average variance 
values of the predicted local coefficient vectors for the two combinations. 
The background noise amplitude is ^ = 10" \ The results are obtained from 
20 independent realizations. 



SCIENTIFIC REPORTS | 4 : 3944 | DOI: 1 0. 1 038/srep03944 



4 



(open circles). In contrast, for the node pair #14 and #54, the average 
variance will increase with R^, as shown in Fig. 5(b) (crosses). This is 
because, when one node is under strong random driving, the input to 
the compressive-sensing algorithm will be noisy and its performance 
will deteriorate. However, compressive sensing can perform reliably 
when the input data are "clean," even when they are sparse. 
Increasing the data amount beyond a threshold is not necessarily 
helpful, but longer and noisy data sets can degrade significantly the 
performance. The results in Figs. 5(a,b) thus demonstrate that the 
cancellation ratio between a pair of nodes, in combination with 
the average variance of the predicted coefficient vectors associated 
with the two nodes, can effectively distinguish a hidden node from a 
local noise source. If there are more than one hidden node or there is 
a cluster of hidden nodes, the procedure to estimate the cancellation 
factors is similar but requires additional information about the 
neighboring nodes of the hidden nodes. Our cancellation-factor 
based method can be extended to network systems with nodal 
dynamics not of the continuous-time type, such as evolutionary- 
game dynamics. See Supporting Information for details. 

Discussion 

Our program to differentiate a hidden node from local noise sources 
and then to infer its existence can be summarized into the following 
steps: 

• Collect time series of dynamical variables from accessible nodes; 

• Hypothesize suitable expansion bases for nodal dynamics and 
coupling functions, taking advantage of physical understanding 
of the underlying networked dynamical system; 

• Construct the measurement matrix and derivative vector from 
time series, and solve the expansion-coefficient vector using com- 
pressive sensing; 

• Identify all nodes with abnormally dense connections, and cal- 
culate the corresponding variances using independent segments 
of the available time series to eliminate the hub nodes in the 
network (for those nodes the variances wiU be much smaller than 
those of the neighboring nodes of the hidden node or nodes under 
strong local noise); 

• For all the remaining nodes with abnormally dense connections, 
calculate the cancellation ratio for all possible node pairs and also 
the average variance of the predicted coefficient vectors using 
non-overlapping time-series segments for a series of systematic- 
ally increasing values of the data amount R^; 

• Identify the neighboring nodes of the hidden node as those with 
cancellation ratios approaching unity and the average variance 
tending to zero as R„ is increased. For those pairs with cancel- 
lation ratio not increasing and/or the average variance not 
decreasing with R„, one node in the pair is under the driving of 
a local noise source. 

Detecting hidden nodes in complex networks with a priori 
unknown nodal dynamics, topology, and coupling weights has vast 
application potential, such as in social and biological networks. 
Inferring the existence of hidden node in the presence of local ran- 
dom perturbations is an extremely challenging problem. Our efforts 
represent a step forward in this area of research, where much further 
work is needed. 



Methods 

Compressive-sensing based method to uncover network dynamics and topology. 

We consider tlie typical setting of a complex network of N coupled oscillators in a 
noisy environment. The dynamics of each individual node, when it is isolated from 
other nodes, can be described as x, — F, (x,) + ct]i, where x, G IR*" is the vector of state 
variables, and j/, are an m-dimensional vector whose entries are independent 
Gaussian random variables of zero mean and unit variance, and c denotes the noise 
amplitude. A weighted network can be described by 



H X,- 



-H(x,.) 



(1) 



where e is the coupling matrix between node i and node j, and H is the 

coupling function. Defining 

N 

F',(xO^F,(x,)-H(xi)- W,;, 

we have 



i,=F'i(x,)+ Y. W,;H(x;) 



(2) 



i.e., we have grouped all terms directly associated with node / into F', (xj) . We can then 
expand F ' (Xj) into the following form: 



F',(x,)=^ip'-gr'''(x,), 



(3) 



where g|^ '(x, ) are a set of orthogonal and complete base functions chosen such that 
the coefficients a^^ are sparse. While the coupling function H(Xi) can be expanded in a 
similar manner, for simplicity we assume that they are linear: H{x,) — x,. We then 
have 



;i=^5!'''-g!'''(xi 



- J2 ^-n+inr 



(4) 



where all the coefficients aj' ' and weights W,-, need to be determined from time series 
X,. In particular, the coefficient vector a-' ' determines the nodal dynamics and the 
weighted matrices Wy's give the full topology and coupling strength of the entire 
network. 

Suppose we have simultaneous measurements of all state variables Xi(f) and x,(f + 
(5f) at M different uniform instants of time at interval apart, where (5f <CAi so that 
the derivative vector Xj can be estimated at each time instant. Equation (4) for all M 
time instants can then be written in a matrix form with the following measurement 
matrix: 



Gr- 



Siitl) Xi(t2) 



Xt(tl) 



XN(ti) 



(5) 



Xjv 



(tiM)/ 



where the index k in x^it) runs from I to N, k 9^ i, and each row of the matrix is 
determined by the available time series at one instant of time. The derivatives at 
different time can be written in a vector form as X, — [x,(fi), ■ ■ ■ ,x,(^^^)]^, and the 
coefficients from the functional expansion and the weights associated with all links in 
the network can be combined concisely into a vector a; as 



(6) 



where [•]' denotes the transpose. For properly chosen expansion base and a general 
complex network whose connections are typically sparse, the vector aj to be deter- 
mined is sparse as well. Finally, Eq. (4) can be written as 



(7) 



In the absence of noise or if the noise amplitude is negligibly small, Eq. (7) represents a 
linear equation but the dimension of the unknown coefficient vector a; can be much 
larger than that of Xj, and the measurement matrix will have many more columns 
than rows. In order to fully recover the network of W nodes with each node having m 
components, it is necessary to solve N X m such equations. 

Recovering signal from noisy measurement with compressive sensing algorithm. 

The system of linear equations in Eq. (7 is iU defined. However, since a, is sparse, 
insofar as its number of non-zero coefficients is smaller than the dimension of X„ the 
vector a, can be uniquely and efficiently determined by the compressive-sensing 
algorithm^'*"^^. In particular, in the equation X = G " a + c, reliable recovery of the P- 
dimension sparse vector a is achievable, according to^^, where X e IR*^^ ^ and 
G G ^ ^ but P<g; Q. A sufficiently sparse vector a can be reconstructed by solving the 
following /i regularization problem: 



min||a||, , subject to ||G-a — X||, <£, 



(8) 



where the norm for a vector x is defined as ||x||,^ — |^i|> its I2 norm is 

||x||,^ — l-'i^f I' ^ is the threshold value determined by the noise amplitude. The 

reconstructed vector a lies within the range: ||a — a|| < Ob, where C is a constant. 
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Detection of hidden node. To motivate our consideration, we note tliat, a meaningful 
solution of Eq. (7) based on compressive sensing requires that tlie derivative vector X, 
and the measurement matrix G, be entirely known which, in turn, requires time series 
from all nodes. In this case, we say that information available for reconstruction of the 
complex networked system is complete. In the presence of a hidden node, for its 
immediate neighbors, the available information will not be complete in the sense that 
some entries of the vector X; and the matrix G, become now unknown. Let h denote 
the hidden node. For any neighboring node of h, the vector X, and the matrix G, in Eq. 
(7) now contain unknown entries at the locations specified by the index h. For any 
other node not in the immediate neighborhood of h, Eq. (7) is unaffected. When 
compressive-sensing algorithm is used to solve Eq. {?), there will then be large errors 
in the solution of the coefficient vector a; associated the neighboring nodes of h, 
regardless of the amount of data used. In general, the so-obtained coefficient vector a, 
will not appear sparse. Instead, most of its entries will not be zero, a manifestation of 
which is that the node would appear to have links with almost every other node in the 
network. In contrast, for nodes not in the neighborhood of K the corresponding 
errors will be small and can be reduced by increasing the data amount, and the 
corresponding coefficient vector will be sparse. It is this observation which makes 
identification of the neighboring nodes of the hidden node possible in the noiseless or 
weak-noise situations'^. 

To appreciate the need and the importance to distinguish the effects of hidden node 
from these of noise, we can separate the term associated with h in Eq. (4) from those 
with other accessible nodes in the network. Letting / denote a node in the immediate 
neighborhood of the hidden node h, we have 



(9) 



where G{ is the new measurement matrix that can be constructed from all available 
time series. While background noise may be weak, the term W//, • x/, can in general be 
large in the sense that it is comparable in magnitude with other similar terms in Eq. 
(4). Thus, when the network is under strong noise, especially for those nodes that are 
connected to the neighboring nodes of the hidden node, the effects of hidden node on 
the solution can be entangled with those due to noise. In addition, if the coupling 
strength from the hidden node is weak, it would be harder to identify the neighboring 
nodes. For example, hidden node in a network with Gaussian weight distribution will 
be harder to detect, due to the finite probability of the occurrence of near zero weights. 
When the coupling strength is comparable or smaller than the background noise 
amplitude, the corresponding link cannot be detected. See Supporting Information 
for details. 

Method to distinguish hidden nodes from local noise sources - a mathematical 
formulation. For simplicity, we assume that all coupled oscillators share the same 
coupling scheme and that each oscillator is coupled to any of its neighbors through 
one component of the state vector only. Thus, each row in the coupling matrix Wj/j 
associated with a link between node i and h has only one non-zero element. Let p 
denote the component of the hidden node coupled to the first component of node i, 
the dynamical equation of which can then be written as 



^dF'-gi'''(x,) 
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where [x/,]p denotes the time series of the pth component of the hidden node, which is 
unavailable, and v/f^ is the coupling strength between the hidden node and node /. 
The dynamical equation of the first component of neighboring node j of the hidden 
node has a similar form. Letting 



be the cancellation ratio, multiplying Qjj to the equation of node j, and subtracting 
from it the equation for node i, we obtain 



(12) 



We see that terms associate with [x/Jp vanish and all deterministic terms on the left- 
hand side of Eq. (12) are known, which can then be solved by the compressive-sensing 
method. From the coefficient vector so estimated, we can identify the coupling of 
nodes i and j to other nodes, except for the coupling between themselves since such 
terms have been absorbed into the nodal dynamics, and the couplings to their 
common neighborhood are degenerate in Eq. (12) and cannot be separated from each 
other. Effectively, we have combined the two nodes together by introducing the 
cancellation ratio Q^. 

To give a concrete example, we consider the situation where each oscillator has 
three independent dynamical variables, named as x, y and z. For the nodal and 
coupling dynamics we choose polynomial expansions of order up to n. The x com- 
ponent of the nodal dynamics [Ff (x;)]^ for node i is: 



[FKx.)L=EEEK'.].-^!'ySK 

and the coupling from other node k to the x component can be written as 

where denotes the coupling weight from the y component of node k to the x 
component of node i, and so on. The nodal dynamical terms in the matrix G, are 

and the corresponding coefficients are \ai^\ . The vector of coupling weights is 
[Wy]^- |w^,wJ^,H^j. Equation (12) becomes 

/ £2,; \ 
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(13) 



where c is the sum of constant terms from the dynamical equations of nodes i and j, 
and a'i is the coefficient vector to be estimated which excludes all the constants. Using 
compressive sensing to solve Eq. (13), we can recover the cancellation ratio Qy and the 
equations of node i. When is known we can then recover the dynamics of node 
from the coefficient vector —Qij aj. 

In Supporting Information we provide an analysis and discussion about the pos- 
sible extension of our method to systems of characteristically different nodal 
dynamics and/or with multiple hidden nodes. In particular, we show that the method 
can be readily adopted to network systems whose nodal dynamics are not described 
by continuous-time differential equations but by discrete-time processes such as 
evolutionary- game dynamics. In such a case, the derivatives used for continuous -time 
systems can be replaced by the agent payoffs. The cancellation factors can then be 
calculated from data to differentiate the hidden nodes from local noise sources. We 
also show that, under certain conditions with respect to the coupling patterns between 
the hidden nodes and their neighboring nodes, the cancellation factors can be esti- 
mated even when there are multiple, entangled hidden nodes in the network. 
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